Clustering based on random graph model embedding vertex features

نویسندگان

  • Hugo Zanghi
  • Stevenn Volant
  • Christophe Ambroise
چکیده

Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology. . . ). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with latent structure characterizing each vertex both by a vector of features as well as by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method with real datasets based on hyper-textual documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

GEMSEC: Graph Embedding with Self Clustering

Modern graph embedding procedures can efficiently extract features of nodes from graphs withmillions of nodes. Œe features are later used as inputs for downstream predictive tasks. In this paper we propose GEMSEC a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their features. Œe procedure places nodes in an abstract feature space where the vertex...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding

Vertex clustering in a stochastic blockmodel graph has wide applicability and has been the subject of extensive research. In this paper, we provide a short proof that the adjacency spectral embedding can be used to obtain perfect clustering for the stochastic blockmodel and the degreecorrected stochastic blockmodel. We also show an analogous result for the more general random dot product graph ...

متن کامل

Detecting Overlapping Communities in Social Networks using Deep Learning

In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2010